Search CORE

Clustering consistency in neuroimaging data analysis

Author: Abu-Jamous B
Brattico E
Liu C
Nandi A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Clustering techniques have been applied to neuroscience data analysis for decades. New algorithms keep being developed and applied to address different problems. However, when it comes to the applications of clustering, it is often hard to select the appropriate algorithm and evaluate the quality of clustering results due to the unknown ground truth. It is also the case that conclusions might be biased based on only one specific algorithm because each algorithm has its own assumption of the structure of the data, which might not be the same as the real data. In this paper, we explore the benefits of integrating the clustering results from multiple clustering algorithms by a tunable consensus clustering strategy and demonstrate the importance and necessity of consistency in neuroimaging data analysis

Archivio istituzionale della ricerca - Università di Bari

Exploration of distance metrics in consensus clustering analysis of FMRI data

Author: Abu-Jamous B
Brattico E
Liu C
Nandi A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Clustering techniques have gained great popularity in neuroscience data analysis especially in analysing data from complex experiment paradigm where it is hard to apply traditional model-based method. However, when employing clustering analysis, many clustering algorithms are available nowadays and even with an individual clustering algorithm, choices like parameter settings and distance metrics are very likely to have impacts on the final clustering results. In our previous work, we have demonstrated the benefits of integrating clustering results from multiple clustering algorithms, which provides more stable, reproducible, and complete clustering solutions. In this paper, we aim to further inspect the possible influences from the choices of distance metrics in clustering analysis

Archivio istituzionale della ricerca - Università di Bari

Public Library of Science (PLOS)

Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

Author: A Strehl
A Weingessel
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Fischer
Basel Abu-Jamous
D Greene
D Liu
D Stuart
David J. Roberts
E Dimitriadou
E Dimitriadou
FD Gibbons
HG Ayad
JM Pena
K Tumer
KY Yeung
LP Zhao
MBH Rhouma
N Slonim
O Nwamadi
PT Spellman
R Avogadri
R BabusÏka
R Baumgartner
R Fa
R Nilsson
RJ Cho
Rui Fa
S Dudoit
S Haykin
S Vega-Pons
S Vega-Pons
SA Salem
Shyamal D. Peddada
T Pramila
TE Kohonen
X Zhou
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

CiteSeerX

Jyväskylä University Digital Archive

UCL Discovery

Jyväskylä University Digital Archive

FigShare

UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

Author: A Huber
A Prelić
AA Shabalin
AP Gasch
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
Basel Abu-Jamous
C Koch
CH Wade
CT Harbison
D Dikicioglu
D Liu
DA Orlando
David J. Roberts
IS Dhillon
J Bahler
J Yang
JK Choi
JK Limb
JM Pena
JM Stuart
KC Li
KC Li
KY Yeung
KY Yeung
L Lazzeroni
LP Zhao
MB Eisen
P Cahan
P Grandi
PC Roberts
PT Spellman
R Fa
R Lletı́a
R Nilsson
RJ Cho
RM Piro
Rui Fa
S Chu
S Fujii
S Sharma
S Vega-Pons
T Hayata
T Murali
T Pramila
TC Fleischer
VA Gennarino
X Liu
Y Cheng
Y Kluger
Z Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2015
Field of study

Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

Springer - Publisher Connector

Public Library of Science (PLOS)

SMART: Unique splitting-while-merging framework for gene clustering

Author: A Thalamuthu
AD Lanterman
AE Teschendorff
AK Jain
Asoke K. Nandi
B Abu-Jamous
B Fritzke
B Fritzke
CR Lin
CS Wallace
D Dembele
D Jiang
David J. Roberts
G Celeux
H Akaike
J Qin
J Rissanen
KY Yeung
L Hubert
L Mavridis
L Zhao
MAT Figueiredo
P Tamayo
PT Spellman
R Xu
R Xu
RJ Cho
Rui Fa
S Bandyopadhyay
S Monti
S Wu
Sergio Gómez
T Kohonen
T Pramila
TR Golub
WM Rand
YJ Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/04/2014
Field of study

Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

Cmr1/WDR76 defines a nuclear genotoxic stress body linking genome integrity and protein quality control

Author: A Khmelinskii
A Michalski
A Pellicioli
AA Alcasabas
AC Gavin
AH Tong
AH Tong
AJ McClellan
AJ Osborn
B Abu-Jamous
BM O'Neill
C Alabert
C Dekker
C Dekker
C Hoege
CH Eskiw
CH Eskiw
CM Fong
D Branzei
D Kaganovich
DC Schwartz
DE Lea
DH Choi
DP Toczyski
DP Toczyski
EA Winzeler
EM Sontag
G Pappenberger
H Shimodaira
HT Tran
I Petrovska
I Psakhye
J Cox
J Cox
J Klermund
J Pines
J Rappsilber
J Shimizu
J Tyedmers
JC Dittmar
JM Gilmore
JM Tkach
K Ask
K Labib
K Yoshikawa
KW Yuen
L Crabbe
L Gellon
L Malinovska
M Haslbeck
MK Sung
MK Zubko
N Erdeniz
R Narayanaswamy
R Reid
R Spokoini
R Tao
S Maere
S Mimura
S Specht
SE Ong
SR Collins
T Abbas
T Naiki
V Lallemand-Breitenbach
W Condemine
WK Huh
XH Wu
Z Jasencakova
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

DNA replication stress is a source of genomic instability. Here we identify changed mutation rate 1 (Cmr1) as a factor involved in the response to DNA replication stress in Saccharomyces cerevisiae and show that Cmr1—together with Mrc1/Claspin, Pph3, the chaperonin containing TCP1 (CCT) and 25 other proteins—define a novel intranuclear quality control compartment (INQ) that sequesters misfolded, ubiquitylated and sumoylated proteins in response to genotoxic stress. The diversity of proteins that localize to INQ indicates that other biological processes such as cell cycle progression, chromatin and mitotic spindle organization may also be regulated through INQ. Similar to Cmr1, its human orthologue WDR76 responds to proteasome inhibition and DNA damage by relocalizing to nuclear foci and physically associating with CCT, suggesting an evolutionarily conserved biological function. We propose that Cmr1/WDR76 plays a role in the recovery from genotoxic stress through regulation of the turnover of sumoylated and phosphorylated proteins

Copenhagen University Research Information System

King's Research Portal

Queen Mary Research Online

Sussex Research Online

In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer

Author: A Grothey
A Lachmann
A McIntyre
A Ortiz-Barahona
Adrian L. Harris
AL Harris
AR Green
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Bolstad
B Harris
Basel Abu-Jamous
C Camps
C Curtis
C Shen
C Ward
CV Dang
D Tabas-Madrid
DC Singleton
DR Mole
DT Jones
E Favaro
E Favaro
EY Chen
EY Tan
FM Buffa
FM Buffa
Francesca M. Buffa
G Ciriello
G Multhoff
GL Semenza
GP Elvidge
H Zhang
HA Askautrud
I Kümler
I Papandreou
I Papandreou
J Quackenbush
J Schödel
J Yang
J-W Kim
JD Gordan
JMR Danielsen
JS Lee
JS Lee
JS Park
K Rock De
L Giot
L-C Lai
LD Miller
M Koritzinsky
M Koshiji
M Tafani
M Varjosalo
M Varjosalo
M-Z Wu
MC Brahimi-Horn
MV Kuleshov
NS Fox
P Carmeliet
P Carmona-Saez
PG Corn
R Bernardi
R Krutilina
R Nogales-Cadenas
R Shen
RH Wenger
SA Wagner
T Rzymski
T Takahashi
X Chen
X Lu
X Tang
X Xia
Y Benita
Y Wang
ZE Stine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

© The Author(s), 2017. Background Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. Results We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. Conclusions We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.Dr. Abu-Jamous would like to acknowledge the financial assistance from Brunel University London. Professors Buffa and Harris acknowledge support from Cancer Research UK, EU framework 7, and the Oxford NIHR Biomedical Research Centre. Professor Harris acknowledges support from the Breast Cancer Research Foundation. Professor Nandi would like to acknowledge that this work was partly supported by the National Science Foundation of China grant number 61520106006 and the National Science Foundation of Shanghai grant number 16JC1401300. The funding bodies have no role in the design of the study, in the collection, analysis, and interpretation of data, or in writing the manuscript

Oxford University Research Archive

Traditional knowledge of wild edible plants used in Palestine (Northern West Bank): A comparative study

Author: A Abu-Rabia
A Abu-Rabia
A Della
A Keys
A Pieroni
A Pieroni
A Verde López
ACH Hadjichambis
Aseel A Musleh
BA Meilleur
BM Ogle
Buthainah A Isa
D Rivera
DM Esawi
F Ertug
Fatemah A Kherfan
Food and Agriculture Organization of the United Nations (FAO)
G Crowfoot
G Galeano
G Picchi
GE Post
Ghadah M Swaiti
H Azaizeh
Hanadi A Nasrallah
Hanan M Herzallah
Hebah K Azem
IK Khafagi
Isra' S Khdair
Israa M Soos
J Tardío
Jehan H Al-Shafie'
JR Stepp
K Balemie
Kifayeh H Qarariah
M Heinrich
M Pardo-de-Santayana
M Zohary
MA Bonet
Maha M Haj-Ali
MG Paelotti
MHC Forbs
Mohammed S Ali-Shtayeh
MP Ghirardini
MS Ali-Shtayeh
MS Ali-Shtayeh
MS Ali-Shtayeh
MS Ali-Shtayeh
MS Ali-Shtayeh
Muna A Abuzahra
MÀ Bonet
N Etkin
Nehaya A Saifi
O Said
R Dafni
Rana M Jamous
Rasha B Khlaif
Samiah M Aiash
T Johns
U Plitmann
V Heywood
V Heywood
Wafa' A Elgharabah
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background A comparative food ethnobotanical study was carried out in fifteen local communities distributed in five districts in the Palestinian Authority, PA (northern West Bank), six of which were located in Nablus, two in Jenin, two in Salfit, three in Qalqilia, and two in Tulkarm. These are among the areas in the PA whose rural inhabitants primarily subsisted on agriculture and therefore still preserve the traditional knowledge on wild edible plants. Methods Data on the use of wild edible plants were collected for one-year period, through informed consent semi-structured interviews with 190 local informants. A semi-quantitative approach was used to document use diversity, and relative importance of each species. Results and discussion The study recorded 100 wild edible plant species, seventy six of which were mentioned by three informants and above and were distributed across 70 genera and 26 families. The most significant species include <it>Majorana syriaca, Foeniculum vulgare, Malvasylvestris</it>, <it>Salvia fruticosa, Cyclamen persicum, Micromeria fruticosa, Arum palaestinum, Trigonella foenum-graecum</it>, <it>Gundelia tournefortii</it>, and <it>Matricaria aurea</it>. All the ten species with the highest mean cultural importance values (mCI), were cited in all five areas. Moreover, most were important in every region. A common cultural background may explain these similarities. One taxon (<it>Majoranasyriaca</it>) in particular was found to be among the most quoted species in almost all areas surveyed. CI values, as a measure of traditional botanical knowledge, for edible species in relatively remote and isolated areas (Qalqilia, and Salfit) were generally higher than for the same species in other areas. This can be attributed to the fact that local knowledge of wild edible plants and plant gathering are more spread in remote or isolated areas. Conclusion Gathering, processing and consuming wild edible plants are still practiced in all the studied Palestinian areas. About 26 % (26/100) of the recorded wild botanicals including the most quoted and with highest mCI values, are currently gathered and utilized in all the areas, demonstrating that there are ethnobotanical contact points among the various Palestinian regions. The habit of using wild edible plants is still alive in the PA, but is disappearing. Therefore, the recording, preserving, and infusing of this knowledge to future generations is pressing and fundamental.</p

Springer - Publisher Connector

Molecular signatures of the rediae, cercariae and adult stages in the complex life cycles of parasitic flatworms (Digenea: Psilostomatidae)

Author: A Allam
A Bateman
A Coghlan
A Minelli
A Moszczynska
AI Granovitch
AM Bolger
AV Protasio
B Abu-Jamous
B Buchfink
B Fuchs
B Li
C Cantacessi
C Cantacessi
C Gallardo-Escárate
DM Emms
DM Emms
DP Benesh
EJR Vasconcelos
EJR Vasconcelos
F Ronquist
F Xu
G Garg
GH Liu
H Heberle
H Oey
HC Kim
HG Drost
IE Bychovskaja-Pavlovskaja
IJ Tsai
J Huerta-Cepas
J Schindelin
JC Chubb
JF Gao
JP Huelsenbeck
K Khalturin
KCW Valero
KF Hoffmann
KL Howe
KV Galaktionov
L Bankers
L Bergmame
L Fu
L Maciel
LF Maciel
M Berriman
M Gouy
MG Grabherr
MY Pomaznoy
N Moran
NA Moran
ND Young
ND Young
ND Young
ND Young
ND Young
NI Ershov
R Leontovyč
R Leontovyč
R Patro
R Smith-Unna
RC Edgar
RM Waterhouse
S Capella-Gutiérrez
S El-Gebali
S Simon
SN McNulty
TR Bürglin
V Choudhary
VF Oliveira
W Liu
X Wang
XX Zhang
Y Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/11/2020
Field of study

BACKGROUND: Parasitic flatworms (Trematoda: Digenea) represent one of the most remarkable examples of drastic morphological diversity among the stages within a life cycle. Which genes are responsible for extreme differences in anatomy, physiology, behavior, and ecology among the stages? Here we report a comparative transcriptomic analysis of parthenogenetic and amphimictic generations in two evolutionary informative species of Digenea belonging to the family Psilostomatidae. METHODS: In this study the transcriptomes of rediae, cercariae and adult worm stages of Psilotrema simillimum and Sphaeridiotrema pseudoglobulus, were sequenced and analyzed. High-quality transcriptomes were generated, and the reference sets of protein-coding genes were used for differential expression analysis in order to identify stage-specific genes. Comparative analysis of gene sets, their expression dynamics and Gene Ontology enrichment analysis were performed for three life stages within each species and between the two species.RESULTS: Reference transcriptomes for P. simillimum and S. pseudoglobulus include 21,433 and 46,424 sequences, respectively. Among 14,051 orthologous groups (OGs), 1354 are common and specific for two analyzed psilostomatid species, whereas 13 and 43 OGs were unique for P. simillimum and S. pseudoglobulus, respectively. In contrast to P. simillimum, where more than 60% of analyzed genes were active in the redia, cercaria and adult worm stages, in S. pseudoglobulus less than 40% of genes had such a ubiquitous expression pattern. In general, 7805 (36.41%) and 30,622 (65.96%) of genes were preferentially expressed in one of the analyzed stages of P. simillimum and S. pseudoglobulus, respectively. In both species 12 clusters of co-expressed genes were identified, and more than a half of the genes belonging to the reference sets were included into these clusters. Functional specialization of the life cycle stages was clearly supported by Gene Ontology enrichment analysis.CONCLUSIONS: During the life cycles of the two species studied, most of the genes change their expression levels considerably, consequently the molecular signature of a stage is not only a unique set of expressed genes, but also the specific levels of their expression. Our results indicate unexpectedly high level of plasticity in gene regulation between closely related species. Transcriptomes of P. simillimum and S. pseudoglobulus provide high quality reference resource for future evolutionary studies and comparative analyses